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[57] ABSTRACT 

A speech signal divided into frames, each frame having a 
sound type, and a class is determined for each frame 
depending on the sound type of the frame. One of multiple 
fihers is selected for each frame depending on the class of 
the frame. Each frame is filtered according to the filter 
selected, and the filtered frames combined to provide a 
filtered speech signal. The system includes filters and soft- 
ware. 
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METHOD AND SYSTEM FOR REGION 
BASED FILTERING OF SPEECH 

RELATED APPLICATION 

This application is related to U.S. patent application ser. 
No. 08/695,097, which was filed on the same date and 
assigned to the same assignee as the present application. 

TECHNICAL FIELD 

This invention relates to an adaptive method and system 
for filtering speech signals. 

BACKGROUND ART 

In wireless communications, background noise and static 
can be annoying in speaker to speaker conversation and a 
hindrance in speaker to machine recognition. As a result, 
noise suppression Ls an important part of the enhancement of 
speech signals recorded over wireless channels in mobile 
environments. 

In thai regard, a variety of noise suppression techniques 
have been developed. Such techniques typically operate on 
single microphone, output-based speech samples which 
originate in a variety of noisy environments, where it is 
assumed that the noise component of the signal is additive 
with unknown coloration and variance. 

One such technique is Least Mean-Squared (LMS) Pre- 
dictive Noise Cancelling. In this technique it is assumed that 
the additive noise is not predictable, whereas the speech 
component is predictable. LMS weights are adapted to the 
time series of the signal to produce a time-varying matched 
filter for the predictable speech component such that the 
mean -squared enor (MSE) is minimized. The estimated 
clean speech signal is then the filtered version of the time 
series. 

However, the structure of speech in the time domain is 
neither coherent nor stationary enough for this technique to 
be effective. A trade-off is therefore required between fast 
settling time, good tracking ability and the ability to track 
everything (including noise). This technique also has difii- 
culty with relatively unstructured non-voiced segments of 
speech. 

Another noise suppression technique is Signal Subspacc 
(SSP) filtering (which here includes Spectral Subtraction 
(SS)). SSP is essentially a weighted subspace fitting applied 
to speech signals, or a set of bandpass filters whose outputs 
are linearly weighted and combined. SS involves estimating 
the (additive) noise magnitude spectrum, typicaUy done 
during non-speech segments of data, and subtracfing this 
spectrum from the noisy speech magnitude spectrum to 
obtain an estimate of the clean speech spectrum. If the 
resulting spectral estimate is negative, it is rectified to a 
small positive value, lliis estimated magnitude spectrum is 
then combined with the phase information from the noisy 
signal and used to construct an estimate of the clean speech 
signal. 

SSP assumes the speech signal is well-approximated by a 
sum of sinusoids. However, speech signals are rarely simply 
sums of undamped sinusoids and can, in many common 
cases, exhibit stochastic qualities (e.g., unvoiced fricatives). 
SSP relies on the concept of bias-variance trade-off. For 
channels having a Signal- to-Noise Ratio (SNR) less than 0 
dB, some bias is pennitted to give up a larger dosage of 
variance and obtain a lower overall MSE. In the speech case, 
the channel bias is the clean speech component, and the 
channel variance is the noise component. However, SSP 
does not deal well with channels having SNR greater than 
zero. 
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In addition, SS is undesirable unless the SNR of the 
associated channel is less than 0 dB (i.e., unless the noise 
component is larger than the signal component). For this 
reason, the ability of SS to improve speech quality is 
5 restricted to speech masked by narrowband noise, SS is best 
viewed as an adaptive notch filter which is not well appli- 
cable to wideband noise. 

Still another noise suppression technique is Wiener 
filtering, which can take many forms including a statistics- 
based channel equalizer. In this context, the time domain 
signal is filtered in an attempt to compensate for non- 
uniform frequency response in the voice channel. Typically, 
this filter is designed using a set of noisy speech signals and 
the corresponding clean signals. Taps are adjusted to opti- 
mally predict the clean sequence from the noisy one accord- 
ing to some error measure. Once again, however, the struc- 
ture of speech in the time domain is neither coherent nor 
stationary enough for this technique to be effective. 

Yet another noise suppression technique is Relative Spec- 
20 tral speech processing (RASTA). In this technique, multiple 
filters are designed or trained for filtering spectral subbands. 
First, the signal is decomposed into N spectral subbands 
(currently. Discrete Fourier Transform vectors are used to 
define the subband filters). The magnitude spectrum is then 
25 filtered with N/2+1 hnear or non- linear neural-net subband 
fihers. 

However, the characteristics of the complex transformed 
signal (spectrum) have been elusive. As a result, RASTA 
subband filtering has been performed on the magnitude 

30 spectrum only, using the noisy phase for reconstruction. 
However, an accurate estimate of phase information gives 
little, if any, noticeable improvement in speech quality. 

The dynamic nature of noise sources and the non- 
stationery nature of speech ideally call for adaptive tech- 

35 niques to improve the quality of speech. Most of the existing 
noise suppression techniques discussed above, however, are 
not adaptive. Such adaptation can be performed in various 
dimensions and at various levels. One type of adaptation 
where importance is given to noise characteristics and level 

40 is based on level of noise and level of distortion in a speech 
signal. However, for a given noise level, adaptation can also 
be done based on speech characteristics. The best solution 
being adaptation based simultaneously on noise character- 
istics as well as speech characteristics While some recently 

45 proposed techniques are designed to adapt to the noise level 
or SNR, none take into account the non-stationary nature of 
speech and try to adapt to different sound categories. 

An article by Harris Ducker entitled ''Speech Processing 
in a high ambient noise environment", IEEE Trans. Audio 

50 and ElectroacousticSy Vol. 16, No. 2, June, 1968, pp. 
165-168, discusses the effect of noise on different speech 
sounds and the resulting confusion among sound categories. 
While a high -pass filter is employed in an effort to resolve 
this confusion, such a filter is only used for some sound 

55 categories. Moreover, the classification of sound in this 
technique is only done manually by experiment. 

Thus, there exists a need for a noise suppression technique 
which would automatically classify sounds and apply an 
appropriate filter for each class. Moreover, such a technique 

60 would use filtering that adapts to speech sounds. 

DISCLOSURE OF INVENTION 
Accordingly, it is the principle object of the present 
invention to provide an improved method and system for 
65 filtering speech signals. 

According to the present invention, then, a method and 
system are provided for adaptively filtering a speech signal. 
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The method comprises dividing the signal into a plurality of where the FIR filters predicted known clean speech spectra 

frames, each frame having one of a plurality of sound types from noisy realizations of it. 

associated therewith, and determining one of a plurality of Assuming that the training samples (clean and noisy) are 

classes for each frame, wherein the class determined representative of typical speech samples and that speech 

depends on the sound type associated with the frame. The 5 sequences are approximately stationary across the sample, it 

method further comprises selecting one of a plurality of can be seen that a MMSEE is provided for speech magnitude 

filters for each frame, wherein the filter selected depends on spectra from noisy speech samples. In the case of FIR 

the class of the frame, and filtering each frame according to fiherbanks, this is actually a Linear MMSEE of the com- 

the filter selected. The method still further comprises com- pressed magnitude spectrum. This discussion can, however, 

bining the plurality of filtered frames to provide a filtered lO be extended to include non-linear predictors as well. As a 

speech signal. result, the term MMSEE will be used, even as reference is 

The system of the present invention for adaptively filter- made to LMMSEE. 

ing a speech signal comprises means for dividing the signal There are, however, two problems with the above assump- 

into a plurality of frames, each frame having one of a tions. First, the training samples cannot be representative of 

plurality of sound types associated therewith, and means for all noise colorations and SNR levels. Second, speech is not 

determining one of a plurality of classes for each frame, a stationary process Nevertheless, MMSEE may be 

wherein the class determined depends on the sound type improved by changing those assumptions and creating an 

associated with the frame. The system further comprises a adaptive subband Wiener filter which minimizes MSE using 

plurality of fillers for filtering the frames, and means for specialized filterbanks according to speech region and noise 

selecting one of the plurality of filters for each frame, levels. 

wherein the filter selected depends upon the class of the In that regard, the design of subband FIR filters is subject 

frame. The system still further comprises means for com- to a MSE criterion. That is, each subband filter is chosen 

bining the plurality of filtered frames to provide a filtered such that it minimizes squared error in predicting the clean 

speech signal. speech spectra from the noisy speech spectra. This squared 

These and other objects, features and advantages will be error contains two components i) signal distortion (bias); 

readily apparent upon consideration of the following and ii) noise variance. Hence a bias-variance tradeoff is 

detailed description in conjunction with the accompanying again seen for minimizing overall MSE. This trade-off 

drawings. produces filterbanks which are highly dependent on noise 

variance. For example, if the SNR of a "noisy" sample were 

BRIEF DESCRIPTION OF DRAWINGS infinite, the subband filters would all be simply 5^ where 

FIG. la-b are plots of filterbanks trained at Signal-to- ( ] k = 0 

Noise Ratio values of 0, 10, 20 dB at subbands centered ^ 1 o o w ' 

around 800 Hz and 2200 Hz, respectively; ^ ' 

FIG. 2 is a flowchart of the method of the present 3S 

invention; and On the other hand, when the SNR is low, filterbanks are 

FIG. 3 is a block diagram of the system of the present obtained whose energy is smeared away from zero. This 

invention phenomenon occurs because the clean speech spectra is 

relatively coherent compared to the additive noise signals. 

BEST MODE FOR CARRYING OUT THE Therefore, the overall squared error in the least squares 

INVENTION (training) solution is minimized by averaging the noise 

component (i.e., reducing noise variance) and consequently 

Improving the quality of speech signals m the presence of ^^^^^-^^ ^^^^ ^-^^^^ distortion. If this were not true, 

noise requires understandmg the characteristics of the noise ^^^^^^ ^^^^^ ^^^^^ ^^-^^ ^^^p^^t jy,SE) ^y filtering 

source as well as its effects on the speech signal at vanous 45 spectral magnitudes of noisy speech, 

levels and on different regions of the speech signal. ^^^^ ^ -^^ filterbanks which were trained at SNR 

However, it is not feasible to obtain enough samples to study ^^j^^^ ^0 dB, respectively, are shown in FIG. 1 to 

all possible noise sources, illustrate this point. The first set of filters (FIG. la) corre- 

Traditionally, the Wiener filtering techniques discussed spond to the subband centered around 800 Hz, and the 

above have been packaged as a channel equalizer or spec- 50 second (FIG. 1^?) represent the region around 2200 Hz. The 

trum shaper for a sequence of random variables. However, fiHers corresponding to lower SNR's (In FIG. 1, the filter- 

the subband filters of the RASTA form of Wiener filtering banks for the lower SNR levels have center taps which are 

can more properly be viewed as Minimum Mean-squared similarly lower) have a strong averaging (lowpass) capabil- 

Enror Estimators (MMSEE) which predict the clean speech ity in addition to an overall reduction in gain, 

spectrum for a given channel by filtering the noisy spectrum, 5s With particular reference to the filterbanks used at 2200 

where the filters are pre-determined by training them with Hz (FIG. 16), this region of the spectrum is a low-point in 

respect to MSE on pairs of noisy and clean speech samples. the average spectrum of the clean training data, and hence 

In that regard, original versions of RASTA subband filters the subband around 2200 Hz has a lower channel SNR than 

consisted of heuristic Autoregressive Moving Average the overall SNR for the noisy versions of the training data. 

(ARM A) filters which operated on the compressed magni- 60 So, for example, when training with an overall SNR of 0 dB, 

tude spectrum. The parameters for these filters were the subband SNR for the band around 2200 Hz is less than 

designed to provide an approximate matched filter for the 0 dB (i.e., there is more noise energy than signal energy). As 

speech component of noisy compressed magnitude spec- a result, the associated filterbank, which was trained to 

trums and were obtained using clean speech spectra minimize MSE, is nearly zero and effectively eliminates the 

examples as models of typical speech Later versions used 65 channel. 

Finite Impulse Response (FIR) filterbanks which were Significantly, if the channel SNR cannot be brought above 

trained by solving a simple least squares prediction problem, 0 dB by filtering the channel, overall MSE can be improved 
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by simply zeroing the channel. To pre-determine the post- 
filtered SNR, three quantities are needed: i) an initial (pre- 
filtered) SNR estimate; ii) the expected noise reduction due 
to the associated subbaad filter; and iii) the expected 
(average speech signal distortion introduced by the filler. For 5 
example, if the channel SNR is estimated to be -3 dB, the 
associated subband filter's noise variance reduction capa- 
bility at 5 dB, and the expected distortion at -1 dB, a positive 
post-filtering SNR is obtained and the filtering operation 
should be performed. Conversely, if the pre-filtering SNR lo 
was instead -5 dB, the channel should simply be zeroed. 

The above discussion assumes that an estimator of sub- 
band SNR is available. This estimator must be used for the 
latter approach of determining the usefulness of a channel's 
output as well as for adaptively determining which subband 15 
filter should be used. In that regard, an SNR estimation 
technique well known in the art which uses the bimodal 
characteristic of a noisy speech sample's histogram to 
determine the expected values of signal and noise energy 
may be used. However, accurately tracking multiple 20 
(subband) SNR estimates is difficult since instantaneous 
SNR for speech signals is a dramatically varying quantity. 
Hence, the noise spectrum, which is a relatively stable 
quantity, may instead be tracked This estimate may then be 
used to predict the locahzed subband SNR values. The 25 
bimodal idea of the known SNR estimation technique 
described above may still contribute as a noise spectrum 
estimate. 

Thus, speech distortion is allowed in exchange for 
reduced noise variance. This is achieved by throwing out 30 
channels whose SNR is less than 0 dB and by subband 
filtering the noisy magnitude spectrum. Noise averaging 
gives a significant reduction in noise variance, while effect- 
ing a lesser amount of speech distortion (relative to the 
reduction in noise variance). Subband filterbanks are chosen 35 
according to the SNR of a channel, independent of the SNR 
estimate of other channels, in order to adapt to a variety of 
noise colorations and variations in speech spectra. By spe- 
cializing sets of filterbanks for various SNR levels, appro- 
priate levels for noise variance reduction and signal distor- 40 
tion may be adaptively chosen according to subband SNR 
estimates to minimize overall MSE. In such a fashion, the 
problem concerning training samples which cannot be rep- 
resentative of all noise colorations and SNR levels is solved. 

However, speech non-stationarity also poses a difiScuU 45 
barrier for many noLse suppression techniques. Recall that 
one of the problems with the LMS Predictive technique is 
suflBciently tracking changes in the speech signal without 
tracking everything (including the noise component of the 
signal). A significant hindrance to SSP is that, while some so 
regions (e.g. vowels) are well-approximated by a reduced 
rank model (that is, vowels typically exhibit peaked spec- 
trums whose valleys represent subband areas which can be 
thrown out due to low subband SNR), unvoiced fricatives do 
not. The result of running SSP on a speech signal without 55 
regard for a region-based analysis is a processed signal 
whose unvoiced regions sound musical or whistle -like. 

It can be empirically assumed, however, that a sequence 
of many speech phonemes, each from a common class (e.g. 
vowels or nasals), is more stationary than a typical speech 60 
sample consisting of all phonemes, such as conversational 
speech. The present invention uses this assumption to pro- 
vide improved noise suppression and may be described as 
filtering of noisy speech based on the type of speech sound 
in the signal. 65 

For example, to train a set of filterbanks for the class of 
nasals, a classifier (rough speech recognizer) is first built 



which detects nasal frames in the time domain and marks 
them. Such an classifier must be robxist across noisy envi- 
ronments. Next, the filterbanks are trained across various 
noise levels as discussed above, using only those frames 
marked as "nasal*' frames. The resulting filterbank set is then 
used for noise suppression whenever the region classifier 
indicates a nasal region. This training process would also be 
performed for other classes of speech such as vowels, glides, 
fricatives, etc. 

The present invention thus provides a multi-resolution 
speech recognizer which uses region -based filtering to 
obtain finer resolution phoneme estimates within a class of 
phonemes. This is accomplished generally by estimating the 
class of phoneme, filtering with the appropriate filterbank, 
and performing a final phoneme detection, where the search 
is limited to the particular class in question (or at least 
weighted heavily in favor of it). 

Referring now to FIG. 2, a flowchart of the method of the 
present invention is shown. As seen therein, the method 
comprises dividing (10) a corrupted speech signal into a 
plurality of frames, each frame having one of a plurality of 
sound types associated therewith, and determining (12) one 
of a plurality of classes for each frame, wherein the class 
determined depends on the sound type associated with the 
frame. The method further comprises selecting (14) one of 
a plurahty of filters for each frame, wherein the filter 
selected depends on the class of the frame, and filtering (16) 
each frame according to the filter selected. The method still 
further comprises combining (18) the plurality of filtered 
frames to provide a filtered speech signal. 

It should be noted that the method of the present invention 
may include two stages. During a training stage, filter 
parameters are estimated for the filters based on clean 
speech signals. Actual filtering is performed during a noise 
suppression stage. A broad category classifier is used to 
classify each frame of speech signal into an acoustic cat- 
egory. Sound categories for classifying each frame prefer- 
ably include sHence, fricatives, stops, vowels, nasals, glides 
and other non-speech sounds. In the preferred embodiment, 
artificial neural networks are trained to perform this classi- 
fication. 

It should also be noted that the noisy signal is filtered 
across the frames using the specific filter designed for the 
particular speech sound category to which that frame 
belongs. That is, different filters are designed for each 
acoustic class and an appropriate filter from a filterbank is 
applied to each frame of speech based on the output of the 
classifier. The frames themselves are portions of the cor- 
rupted speech signal from the time domain and have a 
pre-selected period, preferably 32 msec with 75% overlap. 
However, frame size may also be adaptively chosen to match 
the class of sound type. 

Referring next to FIG. 3, a block diagram of the system 
of the present invention is shown. As seen therein, a cor- 
rupted speech signal (20) is transmitted to a decomposer 
(22). As previously discussed with respect to the method of 
the present invention, decomposer (22) divides speech sig- 
nal (20) into a plurality of frames, each frame having one of 
a plurality of sound types associated therewith. 

As discussed above, speech signal (20) is preferably a 
time domain signal. The plurality of frames are then portions 
of speech signal (20) having pre-selected time periods, 
preferably 32 msec. As also discussed above, the plurality of 
sound types associated with the frames preferably includes 
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silence, fricatives, stops, vowels, nasals, glides and other We claim: 

non-speech sounds. A neural network is preferably used to 1. A method for adaptively filtering a speech signal, the 

perform the classification, method comprising: 

Still referring to FIG. 3, decomposer (22) generates a dividing the signal into a plurality of frames, each frame 

decomposed speech signal (24) which is transmitted to an S having one of a plurality of sound types associated 

classifier (26) and a filter bank (28). Once again, as previ- therewith; 

ously discussed with respect to the method of the present determining one of a plurality of classes for each frame, 

invention, classifier (26) determines one of a plurality of wherein the class determined depends on the sound 

classes for each frame, wherein the class determined jyp^ associated with the frame; 

depends on the sound type and noise level associated with 30 selecting one of a plurality of filters for each frame, 

the rrame. ^. , ^ r 1 1 wherein the filter selected depends on the class of the 

Depending on the class of the frame, classifier (26) also frame* 

selects one of a plurality of filters from filterbank (28) for • ' u f . m 1 * ^ a 

- . ^ . , J 1 1-* f /u faltenng each frame according to the filter selected; and 

that frame. As previously discussed, the plurality of filters ^ ^ J'^ ^. ^ 

r i_ 1 ^'-lox L * • J ■ 1 ^^^u combining the plurality of filtered frames to provide a 

from filterbank (28) may be pre-trained using clean speech is filtered s eech si nal 

signals. Moreover, while any type of classifier (26) well ^ Tht^mMhnH^n/^f"m 
known in the art may be used, classifier (26) preferably 
comprises a neural network. The parameters of the neural 

network are estimated by training the neural network with K . , r . . , - . r . , 1 • r 

. , *j I 11 ^„ i^r. A« on 3. The method of claim 1 wherein each of the plurality ot 

hand-segmented clean as well as noisy speech samples. An . i-., 1 ... r 1 r .1. 

* • u i* j *<? filters is associated with one of the plurality of classes for the 

estimator may also determine a speech quality indicator tor fj^jji^g 

each class in each subband. Preferably, such a quality "^^^ ^^^^^ ^^^.^ ^ ^^^^^^ ^^^^^^ 

mdicator is an estimated SNR. ^ ^^^^^ bank 

After each frame is filtered at filterbank (28) according to comprises a ^ ' 

the filter selected therefor by classifier (26), a filtered ^5 . 5. TTie method of claim 1 wherein the speech signal is a 

decomposed speech signal (30) is transmitted to a recon- ^i^e domain signal and each of the plurality of frames 

structor (32) Reconstructor (32) then re-combines the fil- comprises a portion of the signal, each portion havmg a 

tered frames in order to construct an estimated clean speech preselected time period. 

signal (34) As those of ordinary skill in the art will 6. The method of claim 1 wherein the speech signal is a 

recognize, the system of the present invention also includes 30 time domain signal and each of the plurality of frames 

appropriate software for performing the above-described comprises a portion of the signal, each portion having a 

functions. .... variable time period. 

As is readily apparent from the foregomg description ^ ^^^^^^ ^^^^ ^ pj^^^j-^y ^^^^^ 

then, the present invention provides an improved method ^^^^^ comprises speech and non-speech sounds. 

and system for filtering speech signals. More specifically, 35 g ^^^^^^ ^^^^ ^ wherein the plurality of sound 

the present invention thus provides an adaptable method and ^^^^^ comprises silence, fricatives, stops, vowels, nasals, 

system for noise suppression based on speech regions (e.g. glides. 

vowels, nasals, glides, etc.) and noise level which is opti- 9 method of claim 8 wherein the plurality of sound 

mized in terms of bias-variance trade-offs and statistical jypgs further comprises other non-speech sounds, 

stationarity This approach also provides for multi-resolution 10. A system for adaptively filtering a speech signal, the 

speech recognition which uses noise suppression as a pre- system comprising: 

processor. means for dividing the signal into a plurality of frames, 

As is also readily apparent, the present invention can be each frame having one of a plurality of sound types 

applied to speech signals to adaptively filter the noise and 45 associated therewith; 

improve the quality of speech. A better quality service will ^^^^ determining one of a pluraUty of classes for 

result in improved satisfaction among cellular and Personal e^ch frame, wherein the class determined depends on 

Communication System (PCS) customers. The present go^nd type associated with the frame; 

invention can also be used as a preprocessor in speech ^ pluraUly of filters for filtering the frames; 

recognition for noisy speech. Moreover, the broad classifi- 50 ^^eans for selecting one of the pluraUty of filters for each 

cation of the present invention can be used in a speech ^ ^^^^^.^ ^^^^ ^^^^^^^ ^ 

recognizer as a multi-resolution feature identification pro- ^^^^ frame; and 



cess. 



„,t^., ,L , . . u J u jt • means for combining the plurahty of filtered frames to 

While the present invention has been described in con- . , ^i. j u • / 

. , . . . f J. provide a filtered speech signal, 

junction with wireless communication, those of ordmary 55 ^ ^^^^^^ ^^^^ ^^^her comprising means for 

skill in the art will recognize its utiUty m any application estimating parameters for the plurality of filters based on a 

where noise suppression is desired. In that regard, it is to be clean speech signal. 

understood that the present invention has been described in 12. The system of claim 10 wherein each of the plurality 

an illustrative manner and the terminology which has been of filters is associated with one of the plurality of classes for 

used is intended to be in the nature of words of description the frames. 

rather than of limitation As previously stated, many modi- 13, The system of claim 10 wherein the plurality of filters 

fications and variations of the present invention are possible comprises a filter bank. 

in light of the above teachings. Therefore, it is also to be 14. The system of claim 10 wherein the speech signal is 
understood that within the scope of the following claims, the 55 a time domain signal and each of the plurality of frames 

invention may be practiced otherwise than as specifically comprises a portion of the signal, each portion having a 

described. preselected time period. 
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15. The system of claim 10 wherein the speech signal is 
a time domain signal and each of the plurality of frames 
comprises a portion of the signal, each portion having a 
variable time period. 

16. The system of claim 10 wherein the plurality of sound 
types comprises speech and non-speech sounds. 
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17. The system of claim 10 wherein the plurality of sound 
types comprises silence, fricatives, stops, vowels, nasals, 
and glides. 

18. The system of claim 17 wherein the plurality of sound 
types further comprises other non-speech sounds. 
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